Three-dimensional recording and display system using near- and distal-focused images

ABSTRACT

Methods and apparatuses for providing simulated three-dimensional images on a two-dimensional display screen without the use of special filters or overlays on the display or special eyewear. Images having different focal points are displayed as pictures and switched between rapidly, such that the human eye automatically adjusts to focus on each picture in succession. As the pictures are presented sequentially, the viewer may perceive that the display shows images at different depth planes. The number of depth planes may vary between embodiments.

CROSS REFERENCE TO RELATED APPLICATION

This application is a continuation of co-pending U.S. patent applicationSer. No. 12/636,570, filed 11 Dec. 2009, entitled “Three-DimensionalRecording and Display System Using Near- and Distal-Focused Images,” theentirety of which is hereby incorporated by reference for all intentsand purposes.

TECHNICAL FIELD

The technical field relates generally to display systems andmethodologies capable of simulating three-dimensional images, and moreparticularly to display systems and methodologies capable of simulatingthree-dimensional images through the use of near-focused anddistal-focused images as well as systems and methodologies capable ofcapturing three-dimensional images through the use of adaptive apertureand/or focal settings in image capturing device or system.

BACKGROUND

Television in its familiar, two-dimensional display format has existedsince the 1930s, and in even earlier incarnations since the late 1800s.Films have existed for an even longer period. Despite many advancessince their inceptions, television and film technology have largely beenconfined to two-dimensional displays.

Certain modern technologies attempt to simulate three-dimensionaldisplays on a flat two-dimensional surface (such as a television or amovie screen) through the use of specialized eyewear. Other attempts tosimulate three-dimensional displays rely on placing overlays on adisplay surface or device, while still others require specializedscreens integrated into the display device. Given the proliferation oftelevisions, projectors and other video display devices in thehousehold, many consumers may be reluctant to purchase new equipment toview simulated three-dimensional programs.

DESCRIPTION OF THE FIGURES

FIG. 1A illustrates a sample system for capturing images in a sampleenvironment.

FIG. 1B depicts a group of frames that, taken together, form a group ofpictures or “GOP.”

FIG. 2 is a flowchart depicting operations that may be undertaken by theembodiment of FIG. 1A.

FIG. 3 depicts an alternative embodiment of a camera 300 that may beused to captures multiple series of images, each with a different depthplane.

FIG. 4 depicts a generalized environment permitting transmission of anoutput stream 400 for viewing by an end user.

FIG. 5 depicts three pictures that may be shown in sequence on a displaydevice to simulate a three-dimensional image.

SUMMARY

One embodiment takes the form of a method for creating an output stream,including the operations of: focusing on a first focal point; capturinga first series of images having a first depth of field corresponding tothe first focal point; focusing on a second focal point; capturing asecond series of images having a second depth of field corresponding tothe second focal point, the second depth of field different than thefirst depth of field; placing at least a portion of the second series ofimages after a first portion of the first series of images; and placinga second portion of the first series of images after the at least aportion of the second series of images; wherein the first portion of thefirst series of images, the at least a portion of the second series ofimages, and the second portion of the first series of images form anoutput stream.

Another embodiment takes the form of a method for creating an outputstream, including the operations of focusing an imaging capturing devicehaving an aperture and at least one lens focused on a first focal point,capturing a first image of a scene having a first depth of field at thefirst focal point, adjusting the aperture of the image capturing devicefrom a first aperture to at least one second aperture, and capturing atleast one second image of the scene at the first focal point for each ofthe at least one second aperture, wherein the first aperture and the atleast one second aperture result in the capturing of the scene atmultiple depths of field corresponding to the first focal point.

Another embodiment takes the form of a method for displaying an outputstream, including the operations of receiving the output stream;displaying a first image in the output stream, the first imagecorresponding to a first depth of field; after a first transition time,displaying a second image in the output stream, the second imagecorresponding to a second depth of field; after a second transitiontime, displaying a third image in the output stream, the third imagecorresponding to the first depth of field, wherein the first and secondtransition times permit the human eye to refocus between the first andsecond depths of field.

Still another embodiment takes the form of an apparatus for capturingmultiple series of images, including: a first image capture element; asecond image capture element; a first lens optically coupled to thefirst image capture element; a second lens optically coupled to thesecond image capture element; and a housing enclosing at least the firstimage capture element and second image capture element.

DETAILED DESCRIPTION I. Introduction and Definitions

Generally, embodiments described herein may provide simulatedthree-dimensional pictures on a display device, as defined furtherherein below, such as a television, computer monitor, heads up display,movie theater screen, video eyewear, or the like. For at least oneembodiment, the simulated three-dimensional pictures are created, bypresenting multiple image captures of a scene at a given point of time(hereinafter, each image capture a “frame” and a collection of imagecaptures for a given scene, at a given point in time a “picture”),wherein each of the frames may vary from preceding or succeeding framesin focal point, luminosity, and/or a depth of field corresponding to anygiven focal point and wherein each picture may vary, in terms of apicture's aggregate frame or frames, from any preceding or succeedingpicture in time, focal length, luminosity, depth of field, motionpresence or absence, and the like. Further, each image captured of ascene, and the corresponding pictures and frames for such capture mayvary throughout a presentation of the scene by focal distance, perceivedangle of view (e.g., left or right of a given reference location),elevation (e.g., up or down from a given reference plane), and the like.

Each of the frames for a given picture may be presented in any givensequence, at any given periodicity and for any given duration inconjunction with the presentation of one or more pictures. Variations inany given scene presentation may occur in terms of focal point,luminosity, depth of field and/or motion presence or absence, asaccomplished by frame-to-frame and/or picture-to-picture variations inthe foregoing. Such variations desirably resulting in the human eyeautomatically adjusting to such variations so that the correspondinghuman brain perceives a given image and/or a given scene in varyinglevels of detail and/or motion and thereby renders, in the human brain,a perceived three dimensional view of the given image and/or scene.

As the pictures and the frames corresponding thereto are presented, theviewer may perceive that the display shows an image at different depthsof field (referred to herein as “depth planes”) as well as displayingthe image, in the case of motion video, as containing correspondingmotion. The number of depth planes may vary between embodiments and mayvary based upon the presence or absence of motion between pictures and,when such motion is present, the rate of change in visually perceptibleimage elements between successive pictures.

Continuing the overview, an appropriately configured image capturingdevice (hereinafter, a “camera”) may capture images (either still, orvideo) of a scene at different focal points, luminosity and/or depth offield. For example, the camera may film or otherwise capture an image ata first depth plane and simultaneously, or sequentially, capture theimage at a second depth plane. Typically, each focal point lies within acorresponding depth plane. Such image captures, at the correspondingdepth planes, can occur on a frame-by-frame and/or picture-by-picturebasis. Such image capture may also, as desired, account for the presenceor absence of motion in the scene.

Continuing the example, the image may be captured such that theforeground is in focus in one series of frames and the background infocus in a second series of frames. Thus, it can be seen that multipleobjects, although not arranged in a planar fashion, may be in focus in asingle depth plane.

The first and second series of frames may be interleaved to form anoutput stream for display on the display device. In this manner, thedisplay device, when showing the output stream, may show still or videoframes and/or pictures of an image that have different depth planes dueto the changing focal points. In effect, for an output stream having twoseries of frames, pictures and/or images, the stream may vary betweenthe first and second focal points, thereby creating the illusion of athree-dimensional image. Embodiments may interpolate intermediate framesfor inclusion in the output display, to smooth the transition betweenthe first and second series of frames and to smooth transitions betweenpictures and/or images.

It should be appreciated that the three-dimensionality of theaforementioned image may be limited by the number of series of frames(e.g., number of focal points) captured and interleaved to create theoutput stream. Thus, the more frames captured at varying focal points,luminosity and/or depth of field, the finer the granularity of thethree-dimensional effect that may be achieved on the display device. Itshould also be appreciated that the output stream may be constructedsuch that multiple images and/or pictures from an image capture of ascene are shown together. That is, there is no requirement that only onepicture in a row come from a particular series. These concepts will beaddressed in greater detail below.

As used herein, the term “display device” generally refers to anyelectronic device capable of displaying visual information, such as atelevision, projector, computer display, display surface of a mobiledevice, heads up display, video eyewear, and so forth. A “camera”encompasses any device capable of capturing still images and/or videoimages, such images including one or more pictures which include one ormore frames. An image may be captured in the digital or analog domainsand converted into a proper format for presentation to a viewer. Asdiscussed above, a “frame” refers to a single instance of an image orvideo, whether a frame, field, or otherwise. Multiple frames may becombined to create a video sequence or a still image persisting for alength of time, i.e. a picture. Multiple pictures may be combined tocreate a “movie.” An “movie,” for purposes of this document, includesboth still pictures and video.

An “output stream,” as used herein, covers any visual data created frommultiple series of frames and/or pictures, each series may correspond toa different depth plane, focal point, and/or luminosity. Thus, outputstreams are typically ultimately intended for display on some form ofdisplay device, although they may be digitized, multiplexed andotherwise processed prior to display. Both analog film and digital datamay be an output stream in different embodiments.

II. Characteristics of the Human Eye and Display Devices

Generally, the human eye controls its focal point through a series ofinterconnected actions, known as vergence, accommodation andmiosis/mydriasis. Each of these three actions operates together to focusthe eye on a particular object, at a particular distance, thus definingthe focal point of the eye. Embodiments described herein may takeadvantage of the focusing and refocusing of the eye to simulate athree-dimensional viewing experience through manipulation of imagesshown on a two-dimensional display device.

“Vergence” refers to the simultaneous movement of both eyes in oppositedirections, which occurs to provide binocular vision. Convergence is themotion of both eyes towards each other in order to look at an objectcloser than the eyes' current focal depth, while divergence is themotion of both eyes away from each other to look at an object furtheraway. Generally, a person's eyes rotate around a vertical axis whenlooking at an object in order to keep the object in the center of theretinas. Thus, convergence is typically a rotation of the eyes towardone another and divergence is a rotation away.

“Accommodation” is the act of refocusing the eyes through changing thecurvature of the lens in each eye. A ciliary muscle in the eye may actto apply pressure or relax the lens, thereby assisting in changing theeye's focus from far to near. Typically, accommodation is accompanied byvergence, and vice versa.

The constriction of the pupils in the eye is called miosis, while thedilation of the pupil is called mydriasis. In addition to controllingthe amount of light that enters the eye, the acts of miosis/mydriasismay assist in focusing the eye. Essentially, contracting or dilating thepupil may act to change the aperture of the eye. As aperture changes,the depth of field of the eye may likewise change.

Continuing the discussion, vergence across a 10 degree angle may occurin about 40 milliseconds, with latency under 200 milliseconds. Likewise,accommodation may occur in approximately 560 to 640 milliseconds, with alatency of approximately 350 milliseconds. It should be noted that thespeed of the eye's operations, as well as the latencies set forth above,may vary from individual to individual as well as with the nature of thevisual stimulus. For example, accommodation may begin more quickly(e.g., with reduced latency) if following a saccade. Accordingly, thesenumbers are provided for illustration only.

Generally, current display devices operate at a variety of refreshrates. Display devices conforming to the NTSC video standard outputapproximately a 60 Hz signal of roughly 30 fields per second. Displaydevices conforming to the PAL video standard use a 50 Hz refresh ratehaving 25 interlaced fields per second. Most movie projectors display 24frames per second. By contrast, many display devices accepting digitaloutput streams may operate at higher frame rates. For example, LCDcomputer monitors often have a 60+ Hz refresh rate and may display oneframe per refresh. Further, LED monitors commonly have a minimum refreshrate of 240 Hz and can be obtained with refresh rates as high as 2000Hz. Display devices may be configured to operate at even faster refreshrates and thus provide even more frames per second.

III. Capturing Images Having Differing Depth Planes

FIG. 1A illustrates a sample system for capturing images in a sampleenvironment. A camera 100 includes a lens 105 with an adjustableaperture. As the aperture adjusts, the depth plane in focus for thecamera 100 varies. Thus, the camera 100 may focus on a man at focalpoint Fa 120, a tree at focal point Fc 135, anywhere in between, and soon. The camera 100 may have the lens 105 aperture set to place the manin the depth plane at a filming time T1. The aperture may be adjustedsuch that at filming time T2 the tree is in the depth plane. Dependingon the aperture, distance between the man and tree, and other factors,the man may be outside the depth plane that includes the tree. At a timeT3, the camera may return to the depth plane established by the focalpoint Fa. Thus, the aperture of the lens 105 may change at a desiredrate and periodicity during image capture, such that the focal pointbeing filmed transitions. In this manner, multiple series of frames maybe captured rapidly, each at a different effective depth of fielddistance and each having a different depth plane. In addition tocreating different depth planes through aperture changes, the camera 100may be re-focused but the aperture maintained in order to capture imagesat differing depth planes.

In between time T1 and T2, for example when transitioning from focalpoint Fa 120 to focal point Fb 125, the camera 100 may capture frames atintermediate depth planes, such as planes FA1 and FA2, as shown in FIG.1B. These frames may be grouped into a picture, such as picture P2. Thenumber of these intermediate depth planes may vary; each typicallycorresponds to a unique series of frames, which may be grouped aspictures, such as picture P3, in addition to those taken at focal pointsFa and Fb. Further, a series of frames may be taken at a point Fo nearerthe camera 100 than focal point Fa 120 and grouped such as picture P1,or further from the camera than focal point Fc 135, to provideadditional series of frames and pictures defined on additional depthplanes. For purposes of this example, presume the camera captures athird series of frames at a depth plane defined by focal point Fb 125,which is in between focal points Fa and Fc.

As further shown in FIG. 1B, a picture (e.g., P1) including a series offrames (e.g., FO, FO1A, FO2A and FA) may be further grouped with otherpictures (e.g., P2 and P3) to provide a group of pictures “GOP” (e.g.,GOP 1A). A GOP may be further combined with successive groups ofpictures (e.g., GOP1#) to provide for motion compensation informationbetween GOP. In one embodiment a suitable number of GOPs can be providedper each second to accomplish motion picture and 3D picture generationwhile presenting a movie with an acceptable image quality.

It should be noted that, because the camera 100 is not moving but onlyadjusting the aperture of the lens 105, the size, object in andcomposition of the captured images in both series of pictures is roughlyidentical. In this example, the objects in focus change, but whetherthose objects are in the captured image do not. Should objects move orthe camera 100 move, then the images in each series of pictures may wellvary.

It should also be noted that the time lapse between times T1 and T2 neednot equal that between times T2 and T3. That is, the camera 100 maylinger on a particular focal point longer than another. Similarly, thecamera may devote a longer time to filming at one depth plane thananother. Likewise, a camera may be configured to repeat frames,pictures, and GOPs in any desired sequence to accomplish the generationof an image having certain 3D and, as desired, motion characteristics.As an example, the camera may shoot two images in the first series ofpictures at focal point Fa 120, then three images in the second seriesof pictures at focal point Fb 125 and continue this ratio as long asdesired. It will also be appreciated that other image capture ratios maybe used, including even ratios. Generally, the higher the proportion ofthe ratio associated a certain series of images, the more emphasis givento that depth plane in the three-dimensional effect described below.

FIG. 2 is a flowchart depicting operations that may be undertaken by thecamera 100 of FIG. 1A, or another embodiment. Initially, in operation200, the aperture for each series of frames is set. This may be done,for example, by a camera operator. By specifying the aperture settingused to create each series of frames, the depth planes may be definedand the camera operator need not manually adjust the camera aperture. Itshould be appreciated that the length of time at which the camera 100maintains a particular aperture setting may also be specified inoperation 200. Thus, the camera may adjust the lens aperture accordingto a particular timing scheme, if desired.

In operation 205, the camera 100 captures images at a first aperturesetting, thereby creating at least a portion of a first series offrames.

In operation 210, the embodiment determines if a time interval duringwhich the camera captures images at the first depth plane has elapsed.With respect to the example of FIG. 1A, this would be the durationbetween times T1 and T2. Returning to FIG. 2, the time checked inoperation 210 may be set during operation 200. In the event this timehas not elapsed, the embodiment returns to operation 205. In certainembodiments, a number of captured frames may be set instead of a time.

In the event that embodiment determines the appropriate time has passedin operation 210, operation 215 is executed. In operation 215, theaperture of the lens 105 is adjusted to focus the camera 100 on thesecond depth plane (for example, the depth plane determined by the focalpoint Fb 125 in FIG. 1A). The camera 100 may then capture at least aportion of the second series of frames as part of operation 215.

In operation 220, the embodiment determines if a second time intervalhas elapsed. This second time interval, which again may be specified inoperation 200, represents the length of time during which the camera 100captures images in the second depth plane. As with operation 210, acertain number of frames may be specified instead of a temporalduration. Presuming this second interval has not elapsed, the embodimentcontinues to capture images in operation 215.

If, however, the second time interval has elapsed, then operation 225 isexecuted and the embodiment determines whether or not the image capturesequence is complete. If not, the embodiment returns to operation 205.Otherwise, the embodiment executes operation 230.

In operation 230, the embodiment may store each series of frames(presuming the embodiment is not solely a film-based camera). Typically,although not necessarily, each series of frames is stored separatelyfrom one another. It is to be appreciated that any number of frames maybe stored separately or in groups of frames, pictures, GOPs orotherwise. By storing each series separately, an editor or other contentcomposer may choose or create certain simulated three-dimensionaleffects without requiring a scene be re-shot to obtain additional framesat certain depth planes. Each frame in each series may be tagged with atimestamp so that an editor or other content composer may easilycross-reference frames when establishing simulated three-dimensionaleffects.

It should be noted that an operator may generally perform the foregoingoperations manually, rather than programming a camera or cameracontroller to perform the operations. In such a case, the operator mayomit operation 200 as redundant. It should also be noted that anexternal controller, such as a computing device, may be provided toexecute the method shown in FIG. 2. The external controller mayelectronically connect to the camera 100 and execute the foregoingfunctions.

FIG. 2 has been described with respect to a first and second series offrames, e.g., with respect to two depth planes. It should be appreciatedthat three or more depth planes may be defined and captured, each withits own series of frames, pictures, and GOPs, as the case may be. Insuch a case, a third, fourth, . . . Nth time interval and/or aperturesetting may be defined, and operations similar to those in operations215-220 executed for each corresponding series of frames. Typically,operation 225 is performed after all such iterations are completed. Itshould be appreciated, however, that more complex aperture and timingspecifications may be made, which may cause the order of such iterationsto vary by embodiment.

Further, multiple aperture settings may be used to define intermediatedepth planes between, for example, two primary depth planes. Returningbriefly to FIG. 1A, the depth plane defined by the focal point Fb is onesuch intermediate depth plane. The series of pictures captured at theseintermediate fields may be used to interpolate or transition between thetwo primary depth planes when creating an output stream, as describedbelow.

In certain embodiments, it may be useful to consider the playback speedof the output stream when capturing series of frames. For example, theoverall speed at which all series of frames are captured may be 60 Hz tomatch the playback rate of NTSC-compliant display devices. Alternativeembodiments may capture a different number of frames per second, such as24 or 50, to match the display capabilities of film or PAL displaydevices, respectively. In these embodiments, the camera 100 may cyclesufficiently rapidly through the various depth planes to ensure thateach series of frames is captured at least at the appropriate rate. Forexample, presume a camera can modify its aperture setting 400 times persecond (to use an arbitrary number) and there are three separate depthplanes, with the third depth plane being captured twice as long as thefirst and second depth planes (e.g., a 1-1-2 timing). In such anexample, the camera may capture the first and second series of frames at100 Hz and the third series of frames at 200 Hz, then pull down theframe rate as necessary to match the 60 Hz output. Alternativeembodiments may capture frames at a multiple or fraction of a playbackrate to minimize or eliminate the need for pulldown conversion.

In other embodiments, each series of frames may be captured at avariable rate. For example, if the output stream to be created from theseries of frames is to be played on a television, then it may be assumedthe playback speed is approximately 60 fields per second, or 60 Hz. Thecamera may be set to capture no more than 60 frames per second acrossall depth planes, with the specified timing of operation 200 indicatinghow many frames are captured at each aperture setting.

Generally, it may be useful to keep the image capture rate of eachseries of frames above approximately 16 frames per second, which isabout the threshold at which the human eye discerns flicker or jerkinessin movies, television programs and other content that uses motion blurtechniques to smooth transitions between frames.

IV. Alternative Embodiment

FIG. 3 depicts an alternative embodiment of a camera 300 that may beused to captures multiple series of images, each with a different depthplane. Unlike the camera 100 shown in FIG. 1A, this camera 300 includesmultiple lens arrays 305, 310, 315. The aperture of each lens array maybe independently set, such that each lens focuses on a different focalpoint 120, 125, 135 and thereby captures its series of pictures in adifferent depth plane. It should be appreciated that the lens arrays mayfacilitate image capture through conventional film or in a digitalformat. Thus, the lens arrays may each be coupled to a distinct digitalcamera element, such as a charge-coupled device image sensor or otherappropriate digital sensor.

In operation, the camera 300 acts similarly to the camera 100 of FIG.1A. Since it includes multiple lenses 305, 310, 315 that may be set todifferent apertures, the camera may simultaneously record each series offrames without switching between different focal lengths. Accordingly,certain operations set out in FIG. 2 are not performed by the camera 300of FIG. 3. For example, operations 210 and 220 may not be executed.Likewise, in operation 200, timing information may not be provided.

It should be appreciated that each lens 305, 310, 315 may be arrangedequidistant from a midpoint on the face of the camera 300. Thus,although the series of frames captured through each lens may be slightlyoffset from one another, translation of the individual frames to acommon point may be easily accomplished. This may be done through asimple shifting of X-Y coordinates according to known values or a morecomplex translation may account for variations in the image due todifference in angle between the lenses.

V. Creating and Transmitting an Output Stream

Once the various series of frames are captured, they may be combinedinto a single output stream to simulate three-dimensional viewing, suchas that provided through binocular vision. The process of creating theoutput stream is described herein.

Initially, if the camera captured each series of frames in the sequencedesired, then no intervention by an editor is necessary. Instead, thefinal sequence of the various series of frames, pictures and/or GOPs hasalready been captured and the data captured by the camera 100 may besubjected to compression and transmission, as described below. This maybe especially advantageous with live or near-live performances, such assporting events, in that they may be captured and a simulatedthree-dimensional effect provided with little or no delay.

Presuming the camera 100 stored each series of frames separately asdescribed in operation 230 of FIG. 2, an editor or other content creatormay employ the various series to create the content stream. Forsimplicity, the term “editor” will be used herein but is intended tocover any person, entity, device or system that manually,semi-automatically and/or automatically creates an output stream frommultiple series of frames, pictures and/or GOPs, at different depths offield, focal point, luminosity, levels of motion compensation and/orcombinations of the foregoing. The editor may review the capturedframes, pictures, and/or GOPs in each series and choose how manydifferent depth planes are to be included in each segment of the outputstream.

The editor may also choose how long each series of frames, pictures,and/or GOPs is shown, or how many images in each series is shown beforetransitioning to another series. In essence, the editor may weavetogether images from each of the depth planes as desired. In certainembodiments, the editor may wish to create the output stream to returnto each series of frames within 1/25th of a second. Since afterimagesgenerally linger in the human visual system for approximately 1/25th ofa second, transitioning between each series of frames, and thus eachdepth plane, within this time may facilitate the illusion of persistenceof vision at each such plane. This, in turn, may enhance thethree-dimensional illusion experienced by a viewer.

During this process, the editor may wish to ensure each series of framesmaintains a minimum display rate, such as a certain number of frames persecond. In this manner, the resulting output stream may minimize oreliminate flicker that could otherwise occur either during transitionbetween series of frames and/or pictures or between the frames and/orpictures showing a single depth plane. Nonetheless, there is no upper orlower limit on the number of images that may be taken from any singleseries of frames and/or pictures and displayed sequentially. Generally,the more images from a given series of frames and/or pictures displayedper second, the more emphasis that corresponding depth plane may have ina simulated three-dimensional view.

In one embodiment, once the editor has assembled the images in a desiredoutput stream, they may be compressed. In other embodiments, frames,pictures and/or GOPs may be compressed by the camera. decompressed, asnecessary for image editing, and recompressed to provide compressedframes, pictures, GOPs and/or movies. Generally, the individual framesand fields in the output stream may be compressed according to anystandard compression scheme.

For example, the output stream may be separated into I-frames, P-framesand B-frames, with P-frames and B-frames encoding only data that changeswith respect to the reference I-frames. The use of I-frames, P-framesand B-frames is well-known, especially with respect to MPEG-1 and MPEG-2video coding and compression. Similarly, the output stream may besegmented into macroblocks within each frame, or slices within eachframe under the H.264/MPEG-4 codec. Compression may occur at the frame,picture and/or GOP level, as desired by system characteristics.

During compression, the embodiment may determine P-frames (ormacroblocks, or slices) and/or B-frames (or macroblock, or slices) thatnot only interpolate between images in the same series but also thatcompensate for changes between series, or depth planes. Thus, just as aseries of frames may be compressed to account for motion of an objectshown in the frames, so too may compression operate to reduce the datasize of frames that are adjacent in the output stream but displaydifferent depth planes. For example and with reference to in FIG. 1B,frame FO might be encoded as an I frame, frame FA as a B frame andframes FO1A and FO2A as P frames. Similarly, picture P1 might be furthercompressed with respect to pictures P2 through P3, such that GOP 1A isrepresentative of an I frame (of a GOP) and GOP 1# is representative ofa B frame (of a GOP). One of ordinary skill in the art will appreciatethat lossless or near lossless compression of frames, pictures and GOPsmay be achieved by the repetition of frames (e.g., FB occurs in bothpictures P2 and P3), pictures and/or GOPs throughout a movie.Accordingly, as the output stream rapidly transitions between imagesfrom different series, the images may be compressed according to knowntechniques based on the changes in images resulting from differences indepth planes or focal points.

Changes in depth planes and/or focal points may occur between picturesand/or GOPs, or even between frames, as necessary or desired to achievea particular simulated three-dimensional effect.

An output stream or movie, as described herein, may contain multipleseries of frames, pictures and/or GOPs, each of which may convey motionto a viewer. Insofar as an editor or content provider may wish to avoidflickering, strobing or stuttering effects in the output stream, it maybe advantageous for the output stream to have a higher frame rate than astandard television, film or other audiovisual signal. Thus, some outputstreams may be encoded at rates ranging from 60 non-interlaced framesper second to rates as high as 2000 frames per second. By increasing theframe rate of the output stream, images from multiple series of frames,pictures and GOPs and showing multiple depth planes may be used withoutinducing undesired display effects. Although this may increase thebandwidth required to transmit the output stream across a network,compression (as described above) may reduce the overall bandwidth.

It should be noted that compression of the output stream is optional.Likewise, it should be noted that the output stream may be configured tobe displayed on a standard or conventional two-dimensional displaydevice, thereby simulating three-dimensional images on the displaydevice. The pictures and/or GOPs may be used to create a single outputstream that can be decoded by a receiver and displayed on the displaydevice in this fashion. Thus, it should be understood that multipleoutput streams are not necessary to implement embodiments describedherein.

Pictures with varying depths of field may be captured (and framescreated) based not only on a changing aperture of a camera 100, but alsothrough dynamically refocusing a camera. In this manner, the depth offield may vary with focus although the lens aperture does not change.For example, when tracking a moving object coming towards the camera,the camera may stay focused on the object. Thus, as the object moves,the depth of field changes with the camera's focus. Similarly, a firstpicture may have a wide focus that maintains both a car and a backgroundin focus, while the second picture may have a tighter focus on the car,thus placing the background out of focus. Although these images may havethe same focal point, (e.g., on the car), the depth of field may bedifferent.

By capturing pictures having these different depths of field and/orfoci, the output stream may be created in a manner similar to thatdescribed above which respect to changing focal points achieved viaaperture changes. It should be appreciated that a combination ofpictures having different foci and aperture settings may be employedtogether to create an output stream.

FIG. 4 depicts a generalized environment permitting transmission of anoutput stream 400 for viewing by an end user. The output stream 400,once compressed, digitized, multiplexed, and/or otherwise configured,may be transmitted across a network 415 from a content provider 405 to areceiver 410. The network may be any suitable type of network, such as asatellite system, cable system, the Internet, any other wired, wirelessor hybrid network, and so on. The content provider 405 may exercisecontrol over the network 415 in certain embodiments and may not inothers. Thus, for example, the content provider may be a satelliteprovider transmitting the output stream across a proprietary satellitesystem. In another embodiment, the content provider 405 may be a servertransmitting the stream across the Internet. As yet other options, theoutput streams discussed herein may be encoded onto a storage mediumsuch as a blu-ray disc, digital versatile disc, and so on.

The receiver 410 may be any type of device configured to accept,recognize and/or process the output stream 400. For example, thereceiver may be a set-top box, cable box, computer, handheld deviceincluding a personal digital assistant or mobile phone, and so on.Typically, the receiver is connected to or integrated with a displaydevice 420.

VI. Displaying and Viewing an Output Stream

The receiver 410 decodes the output stream 400 and sends it to thedisplay device 420 for viewing by the end user. The displayed outputstream shifts between images at different depth planes. As the outputstream shifts in this fashion, the end user's eyes may refocus throughvergence and/or accommodation to adjust to the change in focal points.Similar refocusing effects may be achieved by changing the luminosity ofall or part of a displayed output stream, as discussed in more detailbelow.

By shifting quickly enough between depth planes, the user's eyes mayperceive multiple depth planes simultaneously, just as with binocularvision. This, in turn, may simulate three-dimensional images on thetwo-dimensional surface of the display device 420. Insofar as the enduser's eyes may adjust as quickly as 40 milliseconds to the differingdepth planes presented by the output stream, the user's brain may befooled into believing that the various depth planes are simultaneouslyviewable and thus generate the illusion of depth.

Luminosity

It should be appreciated that the frames having varying depth planesnonetheless display substantially the same image, but with differentelements of the image in focus. This assists in creating thethree-dimensional effect on the display device 420. For example, FIG. 5depicts three frames 510, 520, 530 that may be shown in sequence on adisplay device 420. Each frame shows substantially the same elements,which in this example are a man 500 and a tree 505. The man and the treeare captured in different series of frames and so are in different depthplanes. Generally, the image 510 corresponds to focal point Fa 120 ofFIG. 1A, image 520 corresponds to focal point Fc 135 and image 530corresponds to focal point Fb 125.

At time Ta, image 510 is displayed. In this image, the man 500 is infocus while the tree 505 is not. At time Tb, image 520 is shown on thedisplay device 420. Insofar as this corresponds to intermediate focalpoint Fb, both the man 500 and tree 505 may be in focus. Next, at timeTc, image 530 is shown to the end user and the tree is in focus, but theman is not, thus corresponding to focal point Fc.

Since the man and tree appear to be the same size in all three imagesbut the depth of field changes, the illusion of three-dimensionality maybe experienced by the end user.

It should be noted that a P-frame 515 and/or B-frame 517 may be shown inbetween the first image 510 and second image 520 to more smoothlytransition between the images. Similarly, a P-frame 525 and/or B-frame527 may facilitate the transition between the second image 520 and thirdimage 530. Although only one P-frame and one B-frame are shown betweeneach pair of images, multiples of either frame type may be used asnecessary or desired.

It should also be noted that a standard receiver 410 and display device420 may be used to decode and display the output stream 400, so long asthe decoding speed of the receiver 410 and refresh rate of the displaydevice 420 are sufficient.

VII. Luminosity Adjustments

In certain embodiments, the data for display on a display device may notonly include frames having different focal depths and/or aperturechanges, but also varying luminosity. The display stream (e.g., the datastream resulting from processing the output stream by a receiver) mayinclude instructions to change the luminosity of the display device,optionally on a frame-by-frame basis. Since luminosity changes may causethe human pupil to constrict or dilate, variances in luminosity may beused to enhance or facilitate the three-dimensional effects describedherein.

Generally, modern display devices are capable of changing luminosity inresponse to a user command or ambient light condition. Embodiments mayinclude a command as part of the display stream to adjust luminosityregardless of changes in ambient light or user input. Thus, additionalvisual effects may be achieved or the foregoing visual effects enhanced.

Such luminosity changes may be made on a whole image basis, or on apixel-by-pixel basis. The latter may be useful, for example, to furtheremphasize particular elements or objects on a display, such as thoseupon which a viewer should focus. Luminosity changes may be createddigitally be emphasizing or deemphasizing certain pixels in a frame ormay be created in an analog fashion through the use of dynamic lightingand/or filtering.

VII. Conclusion

Embodiments have been discussed that may create, enhance, facilitate orsimulate three-dimensional effects through the use of changing focalpoints, changing apertures, luminosity variances, varying depths offield, and so on. It should be understood that any or all of thesemethods may be used together to create and/or enhance a simulatedthree-dimensional image that is displayed on a conventional,two-dimensional display device. Thus, for example, certain embodimentsmay create and/or process an output stream that includes frames and/orGOPs having differing depths of field, focal points and luminosity alloperating together to fool the human eye into perceiving athree-dimensional image. The exact techniques used to construct theoutput stream (and, likewise, to use the output stream to create adisplay stream for display on a conventional two-dimensional displaydevice) may vary by embodiment.

Although the foregoing has described particular systems, embodiments andmethods, it should be understood that alternative embodiments may occurto those of ordinary skill in the art upon reading this document. Forexample, when creating an output stream having multiple depth planessufficiently far away, individual cameras may be used to capture imagesin each depth plane and simply placed close to one another. In such anembodiment, each image captured by each camera may be translated to acommon origin point. As yet another example, the camera 300 of FIG. 3may capture a larger field of view through each lens 305, 310, 315 thanis shown in the final output stream, and non-overlapping portions ofeach series of pictures may be cropped. In view of the above disclosureand the ordinary skill of one in the art, the following is claimed:

1. (canceled)
 2. A computer-implemented method, comprising: receiving aplurality of frames of a scene in sequence as captured at a plurality ofdifferent aperture settings; selecting a particular number of theplurality of frames for inclusion in an output stream; and generatingthe output stream by combining the particular number of the plurality offrames in a pre-determined sequence.
 3. The method of claim 2,comprising generating the output stream so that frame luminosity isvaried within the output stream.
 4. The method of claim 2, comprisinggenerating the output stream so that luminosity of only a portion ofpixels in a particular frame is varied.
 5. The method of claim 2,comprising generating the output stream so that a number of framescaptured at a first aperture setting are in immediate sequence with anumber of frames captured at a second aperture setting.
 6. The method ofclaim 2, comprising generating the output stream so that a number offrames captured at a first aperture setting are interleaved with anumber of frames captured at a second aperture setting.
 7. The method ofclaim 2, comprising compressing the output stream for networktransmission.
 8. The method of claim 2, comprising generating the outputstream so that at least one P-frame is inserted between a first frameand a second frame, and the at least one P-frame is encoded based on adifference in depth of field between a first aperture setting and asecond aperture setting.
 9. The method of claim 2, comprising generatingthe output stream to include at least 60 image frames per second. 10.The method of claim 2, comprising: capturing by an imaging device theplurality of frames of the scene; and adjusting a focal point setting ofthe imaging device so that an aperture setting is different at each of aplurality of different focal points.
 11. The method of claim 2,comprising: capturing by an imaging device the plurality of frames ofthe scene; and adjusting focus of the imaging device so that a focussetting is different in at least one of the plurality of differentaperture settings.
 12. The method of claim 2, comprising outputting fordisplay image frames of the output stream at an interval of 1/25^(th) ofa second.
 13. The method of claim 2, wherein the particular number ofthe plurality of frames is equal to a number of the plurality of frames.14. The method of claim 2, wherein the particular number of theplurality of frames is unequal to a number of the plurality of frames.15. The method of claim 7, wherein the compressing of the output streamcomprises comparing a first image representation received for a firstaperture setting with at least one second image representation receivedfor at least one second aperture setting, detecting differences betweenthe first and second image representations, and communicating thedifferences in the output stream.
 16. The method of claim 2, wherein thepre-determined sequence includes a sequence of frames different than thereceived sequence of frames.